(19) 




EuropfilsehM Patentamt 
European Patent Office 
Office europton des brevete 



llllllllllll 



(12) 



(43) Date of publication: 

27.05.1898 Bulletin 199B/22 

(21) Application number: 9730^471.7 

(22) Date of filing: 21.11.1997 



(11) EP 0 844 559 A2 

EUROPEAN PATENT APPLICATION 

(51) lntCI.6: G06F 9/46, G06F 15/167 



(84) 


Designated Contracting States: 


* DIettsrIeh, Daniel J. 


AT BE OH DE OK ES H FR Q B GH IE IT LI LU MC 


Acton, Massachusetts 01720 (US) 




NLPTSE 


• Davis, Scott H. 




Designated Extension States: 


Qroton, Massachusetto 01460 (US) 




AL LT LV MK RO SI 


• Frank, Steven J. 






Hopklnton, Massachusetts 01748 (US) 


(30) 


Priority: 22.11.1996 US 754481 


* Phillips, Robert S, 


Brookfleld, Massachusetts 01606 (US) 


(71) 


Applicant: MangoSoft Corporation 


• Portsr, David K. 


Weatborough, Massachusetts 01581 (US) 


Littleton, MassBChusetU 01460 (US) 


(72) 


Inventors: 


(74) Representative: Butler, Michael John 


• 


Carter, John B. 


Frank B. Dehn & Co., 




Weetborough, Massachusetts 01581 (US) 


European Patent Attorneys, 


* 


Abraham, WllHam 


179 Quesn Victoria Strest 




Windham, New Hampshire 03087 (US) 


London EC4V4EL (GB) 


• 


Hanson, Thomas G. 






Leominster, Massachusetts 01463 (US) 




(54) 


Shared memory computer networks 





< 

O) 

m 
in 

00 

o 



(57) Distributed shared memory systems and proc- 
assas that can ccnnact into each node of a computer 
network to encapsulate the menxHy management oper- 
ations of the connected nodes artd to provkie thereby 
an abstraction of a shared virtual memory that can span 
across each node of the network-and that optionally 
spans across each memory device connected to the 
computer network. Accordingly, each node on the net- 
work having the distributed shared memory system ot 
the invention can access the shared memory. 
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Description 

The invention relates to computer systems, and 
more particularly, to computer networking systems and 
methods that provide shared memory systems and 
services. 

The conventional computer network includes a 
number of client computers cmnected together and fur- 
ther connected to a server computer that stores the data 
and the programs that client computers employ during 
network operatton. This configuration is generally re- 
ferred to as a client-sender network. Typically, each cli- 
ent is a conventional computer system that includes a 
private nnain memory, typically a RAM memory, and a 
persistent etorage. typically a hard disk. The server is 
usually an expensive high end machine that Includes a 
high speed processor unit and a large memory, often 
having ten to one hundred times more storage than the 
individual client computers. The clients and server co- 
operate to share data and sen^ices among the different 
users, and to thereby make the individual computers ap- 
pear as a unified distributed system. 

To this end, the server acts as a central controller 
that provides through its large memory a central repos- 
itory of network data, and that distributes sen/ces to the 
individual client computers, generally on em as-available 
basis. Typically, these sewices are provided by nrieans 
of specialized software running on a high speed proc- 
essor. 

Although computer networks based on this client- 
server model have generally been successful at provkl- 
ing users with necessary computer sen^k^es, as the user 
demands on computer systenre have increased, the 
weaknesses in the client^en/er network are beginning 
to place limits on the services that can be provided. 

An additional problem with the client-server network 
is that it provides a stalk; operating environment that is 
set for optimal performance at a certain level of network 
activity. Consequently, the client-server network fails 1o 
exploit available resources to improve system perform- 
ance. In particular, as the system activity rises above or 
drops below the expected level of network activity, the 
statk; operating environment tacks any ability to recon- 
figure dynamically the allocation of network resources 
to one providing better performance for the present level 
of activity. 

Moreover, the client-sen/er computer network re- 
quires that computer programs written to operate on the 
client-server network distribute themselves between cli- 
ents and the sewer. This requires that the applteatkin 
programs implement a set of functions that divide the 
program between the clients and the server This distri- 
bution of the applicatbn programs requires that the cli- 
ent-server appltcatran programs be quite complex. For 
exannple, a client-server computer program that shares 
data between different machines must Include function- 
ality that allows for the distribution of multiple copies of 
data files, the maintenance of coherency for the distrib- 



uted copies, and other such k)w-level management 
services. 

Further troubling is that the client-senrer network 
stores all important applications and data files in the 

5 memory of the server system. Consequently, the client- 
sender network is subject to complete system failure 
each time the senrer system crashes. 

For the above reasons, among others, the present 
client-server computer architecture fails to provKle an 

10 adequate response to the increased demands of today's 
computer users. 

The inventkjn provdes systems that can create and 
manage a virtual memory space that can be shared by 
each computer on a network and can span the storage 

ts space of each mennory device connected to the network. 
Accordingly, all data stored on the network can be 
stored within the virtual memory space and the actual 
physrcal kxation of the data can be in any of the memory 
devices connected to the network. 

20 More specifically, the system can create or receive, 
a global address signal that represents a portksn, for ex- 
ample 4k bytes, of the virtual memory space. The global 
address signal can be decoupled from, i.e. unrelated to, 
the physical and virtual address spaces of the underly- 

2S ing computer hardware, to provide support for a memory 
space large enough to span each volatile and persistent 
nnemory device connected to the system. For example, 
systenns of the inventbn can operate on 32-bit comput- 
ers, but can empby global address signals that can be 

30 12B bits wbe. Accordingly, the virtual memory space 
spans 2'' 28 bytes, which is much larger than the 2^2 ad- 
dress space supported by the underlying computer 
hardware. Such an address space can be large enough 
to provbe a separate address for every byte of data aXor- 

36 age on the network, including all RAM, disk and tape 
storage. 

For such a targe virtual memory space, typically on- 
ly a snnalt portion is storing data at any time. Accordingly, 
the system includes a directory manager that tracks 

^0 those portions of the virtual memory space that are in 
use. The system provides physical memory storage for 
each portion of the virtual menmry space in use by map- 
ping each such portksn to a physical memory device, 
such as a RAM memory or a hard-drive. Optionally, the 

^ mapping includes a level of indirection that facilitates da- 
ta migration, fault-tolerant operation, and load balanc- 
ing. 

By allowing each computer to monitor and track 
which portions of the virtual nrtemory space are in use, 

so each computer can share the memory space. This al- 
lows the networked computers to appear to have a slrv 
gle memory, and therefore can atbw application pro- 
grams running on dlffererrt computers to communicate 
using techniques currently employed to communicate 

SB between applications running on the same machine. 

In one aspect, the invention can be understood to 
include computer systems having a shared addressable 
memory space. The systems can comprise a data net- 
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work that carries data signals representative of compu- 
ter readable information a persistent menmory device 
that couples to the data network and that provides per- 
sistent data storage, and plural computers that each 
have an interface that couples to the data network, tor 
accessing the data network to exchange data signals 
therewith. Moreover, each of the computers can include 
a shared memory subsystem for mapping a portbn of 
the addressable memory space to a portion of the per- 
sistent storage to provide addressable persistent stor- 
age for data signals. 

In a system that distributes the storage across the 
memory devices of the network, the persistent memory 
device will be understood to include a plurality of local 
persistent memory devices that each couple to a re- 
spective one of the plural computers. To this same end, 
the system can also include a distributor for mapping 
portions of the addressable memory space across the 
plurality of tocal persistent memory devices and a disk 
directory nnanager for tracking the mapped portions of 
said addressable memory space to provtie information 
representative of the local persistent memory device 
that stores that portcn of said addressable memory 
space mapped thereon. 

The systems can also include a cache system for 
operating one of the local persistent nrtemory devices as 
a cache memory for cache storing data signals associ- 
ated with recently accessed portions of the addressable 
memory space. Further the system can include a migra- 
tion controller for selectively moving porttons of the ad- 
dressable memory space between the local perstetent 
memory devices of the plural computers. The migration 
controller can determine and respond to data access 
patterns, resource demands or any other cnteria or heu- 
ristic suitable tor practice with the invention. According- 
ly, the migration controller can balance the loads on the 
network, and move data to nodes from which it is com- 
monly accessed. The cache controller can be a software 
program running on a host computer to provide a soft- 
ware managed RAM and disk cache. TDe HAM can be 
any volatile memory including SRAM, DRAM or any oth- 
er volatile memory. The disk can be any persistent mem- 
ory including any disk, RAID, tape or other device that 
provides persistent data storage. 

The systems can also include a coherent replication 
controller for generating a copy, or select number of cop- 
ies, of a portion of the addressable memory space main- 
tained m the local persistent memory device of a first 
computer and for storing the copy in the local persistent 
memory devfce of a second computer. The coherent 
replicatton controller can maintain the coherency of the 
copies to provkie coherent data replication. 

The systems can also be understood to provide In- 
tegrated control of data stored in volatile memory and 
in persistent memory. In such systems a volatile mem- 
ory device has volatile storage for data signals, and the 
shared memory subsystem Includes an element, typi- 
cally a ooftware nnodule, for mapping a portion of the 



addressable memory space to a portion of the volatile 
storage. In these systems the volatile memory device 
can be comprised of a plurality of local volatile memory 
devices each coupled to a respective one of the plural 
5 computers, and the persistent memory device can be 
comprised of a plurality of local persistent memory de- 
vices each coupled to a reepective one of the plural com- 
puters. 

In these systems, a directory manager can track the 

10 nriapped portk>n8 of the addressable memory space, 
and can include two sub-components; a disk directory 
manager for tracking portions of the addressable mem- 
ory space mapped to the k)cal persistant memory de- 
vices, and a RAM directory manager for tracking por- 

iB tions of the addressable menr>ory -space mapped to the 
local votatfle memory devices. Optionally, a RAM cache 
system can operate one of the tocal volatile menrtory de- 
vices as a cache memory for cache storing data signals 
associated with recently accessed portions of the ad- 

20 dressable memory space. 

The systems can include additional elements in- 
cluding a paging element for remapping a portion of the 
addressable nr>emory space between one of the local 
volatile memory devces and one of the k)cal persistent 

25 memory devices; a polKy controller for determining a 
resource available signal representative of storage 
available on each of the plural computers and, a paging 
element that rennaps the portion of addressable memory 
space from a memory device of a first computer to a 

90 memory devtee of a second computer, responsive to the 
resource available signal; and a migration controller tor 
moving portions of addressable memory space between 
the local volatile memory devices of the plural comput- 
ers. 

9S Optionally, the systems can include a hierarchy 
manager for organizing the plural computers into a set 
of hierarchical groups wherein each group includes at 
least one of the plural computers. Each the group can 
include a group memory manager for migrating portions 

40 of addressable memory space as a function of the hier- 
archical groups. 

The eystem can maintain coherency between cop- 
ied portions of the memory space by Including a coher- 
ent replication controller for generating a coherent copy 

46 of a portion of addressable memory space. 

The system can generate or receive gksbal address 
signals. Accordingly, the systems can include an ad- 
dress generator for generating a gllobal address signal 
representative of a portion of addressable memory 

so space. The address generator can include a spanning 
unit for generating global address signals as a function 
of a storage capacity associated with the persistent 
memory devices, to pnovkto global address signals ca- 
pable of logically addressing the storage capacity of the 

S6 persistent memory devices. 

In distributed systenns, the directory nnanager can 
be a distrtouted cfirectory manager tor storing within the 
distributed memory space, o directory oignel ropreaent- 
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ativa of a storage location of a portion of the addressable 
memory space. The distributed directory manager can 
include a directory page generator for allocating a por- 
tion of the addressable memory space and for storing 
therein an entry signal representative of a portion of the 
directory signal. The directory page generator optionally 
includes a range generator for generating a range eignal 
representative of a portion of the addressable memory 
space, and for genoratingtha entry signal responsive to 
the range signal, to provide an entry signal representa- 
tive of a portion of the directory signal that corresponds 
to the portion of the addressable memory space. More- 
over, the distributed directory manager can include a 
linking system for linking the directory pages to form a 
hierarchical data structure of the linked directory pages 
as well as a range linking system for linking the directory 
pages, as a function of the range signal, to form a hier- 
archical data structure of linked directory pages. 

As the data stored by the system can be homeless, 
in that the data has no f rxed physical home, but can mi- 
grate, as resources and other factors dk;tate. between 
the memory devices of the network, a computer system 
according to the invention can include a directory page 
generator that has a node selector lor generating a re- 
sponsible node signal representative of a select one of 
the plural computers having location information for a 
portion of the shared address space. This provides a 
level of indirection that decouples the directory from the 
physical storage location of the data. Accordingly, the 
directory needs only to identify the node, or other devrce, 
that tracks the physical locatron of the data. This way, 
each time data migrates between physical storage k)- 
catbns, the directory does not have to be updated, since 
the node tracking the k)cation of the data has not 
changed and still provkdes the physk^l.tocat ton informa- 
tion. 

Accordingly, the system can include page genera- 
tors that generate directory pages that carry information 
representative of a location monitor, such as a respon- 
sible computer node, that tracks a data storage kx^tion, 
to provide a directory structure tor tracking homeless da- 
ta. Moreover, the directory itself can be stored as pages 
within the virtual memory space. Therefore, the data 
storage location can store information representative of 
a director page, to store the directory structure as pages 
of homeless data. 

In another aspect, the invention can be understood 
as methods for providing a computer system having a 
shared addressable memory space. The method can In- 
clude the steps of provkiing a network for carrying data 
signals representative of computer readable informa- 
tion, providing a hard^isk, coupled to the network, and 
having persistent storage for data signals, providing plu- 
ral computers, each having an interface, coupled to the 
data network, for exchanging data signals between the 
plural computers, and assigning a portk^n of the ad- 
dressable memory space to a portion of the persistent 
storage of the hard disk to provide addreeeable persist- 



ent storage for data signals 

Thus, at least in its preferred embodiments, the in- 
vention to provkjes an improved computer system. 
Furthermore, at least in its pref en'ed embodiments, 
s the invention provides computer network systems that 
have adaptable system configurations for dynamically 
exploiting distributed resources and thereby increasing 
network productivity. 

Moreover, at least in its preferred embodiments, the 
1 0 inventran provides computer network eystenns that elim- 
inate the need for appi ballon programs to provide bw- 
teve! memory management services across two or more 
distributed nodes. 

Yet further, the invention, at least in its preferred em- 
ts bodiments, provides computer network systems that 
have improved fault tolerance and that are more readily 
scateable for adding additbnal workstations as well as 
for the interconnection of two or more networks. 

Some embodiments of the invention will now be de- 
20 scribed by way cff example only and with reference to 
the accompanying drawings, in which: 

FIG. 1 illustrates a distributed shared memory com- 
puter network according to the Invention; 

2S FIG. 2 is a functional block diagram that illustrates 
in more detail one distributed shared menK)ry com- 
puter network of the type shown in FIG. i ; 
FIG. 3 illustrates in more detail a shared memory 
subsystem suitable for practice with the network il- 

30 lustratedinFIG. 2; 

FIG. 4 is a functional block diagram of one shared 
memory subsystem according to the invention; 
FIG. 5 illustrates a directory page that can be pro- 
vided by a shared memory subsystem of the type 
.35 _^ depicted in FIG. 4; 

FIG. 6 illustrates a directory that can be distributed 
within a shared memory and formed of directory 
pages of the type illustrated in FIG. 5; and 
FIG. 7 illustrates in functbnal block diagram form a 

40 system of the inventton that employs a directory ac- 
cording to FtG. 6 for tracking portions dt a distribut- 
ed shared memory. 

FIG. 1 illustrates a conrtputer network 10 that pro- 
4S vkJes a shared memory that spans the mennory space 
of each node of the depicted computer network 10. 

Specifically, FtG. 1 illustrates a conrvsuter network 
10 thai Includes a plurality ol nodes I2a-12c, each hav- 
ing a CPU 14, an operating system 16, an optional pri- 
^0 vate memory devtee 16, and a shared menrx>ry subsys- 
tem 20. As further depicted in by FIG. 1 , each node 1 2a- 
12c connects via the shared memory subsystem 20 to 
a virtual shared memory 22, As will be explained in 
greater detail hereinafter, by providing the shared mem- 
S6 ory subsystem 20 that allows the node 12a-12c to ac- 
cess the virtual shared memory 22, the conrtputer net- 
work 10 enables network nodes 12a-12c to communi- 
cate and share functionality using the same techniques 
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employed by applications when communicating be- 
tween applications running on the same machine. 
These techniques can employ object linl(ing and em- 
bedding, dynamic link libraries, class registering, and 
other such techniques. Accordingly, the nodes 12 can 
employ the virtual shared memory 22 to exchange data 
and objects between application programs running on 
the different nodes 12 ot the network 10. 

In the embodiment depicted in FIG. 1, each node 

12 can be a conventional computer system such as a 
commercially available IBM PC compattole computer 
system. The processor 14 can be any processor unit 
suitable for performing the data processing for that com- 
puter system. The operating system 1 6 can be any com- 
mercially available or proprietary operating system that 
includes, or can access, functions tor accessing the lo- 
cal memory of the conrputer system and networking. 

The prh/ate memory device 1 8 can be any computer 
memory device suitable for storing data signals repre- 
sentative of computer readable information. The private 
memory provides the node with k)cat storage that can 
be kept inaccessible to the other rrodes on the network. 
Typically the private menrrory devkie 1 8 iricludes a RAM, 
or a portton of a RAM memory, for temporarily storing 
data and application programs and for providing the 
processor 14 with memory storage for executing pro- 
grams. The private memory device. IBrcan also include 
persistent memory.storage. typically a hard disk unit or 
a portk>n of a hard disk unit, for the persistent storage 
of data. 

The ehared memory subsystem 20 depicted in FIG. 
1 is an embodiment of the Invention that couples be- 
tween the operating system 1 6 and the virtual shared 
memory 22 and forms an interface between the operat- 
■ tng system 16 and the= virtual shared memory to allow 
the operating system 16 to access the virtual shared 
memory 22. The depicted shared memory subsystem 
20 is a software module that operates as a stand-alone 
distributed shared memory engine. The depicted sys- 
tem Is illustrative and other systenre otthe invention can 
be realized as shared memory subsystems that can be 
embedded into an application program, or be imple- 
mented as an embedded code of a hardware device. 
Other such applkattons can be practiced without de- 
parting from the scope ol the inventton. 

The depicted virtual shared memory 22 illustrates a 
virtual ehared memory that is accessible by each of the 
nodes 12a-12c v'e the shared memory subsystem 20. 
The virtual shared monyory 22 can map to devices that 
provide physical storage for computer readable data, 
depicted in FIG. 1 as a plurality of pages 24a-24d. In 
one embodiment, the pages form portions of the shared 
memory space and divide the address space of the 
shared memory into page addressable memory spaces. 
For example the address space can be paged into 4K 
byte sections. In other embodiments altematlve granu- 
larity can be employed to manager the shared memory 
apace. Each node I2e-1 2g through the shared memory 



subsystem 20 can access each page 24a-24d stored in 
the virtual shared memory 22. Each page 24a-24d rep- 
resmts a unique entry of computer data stored within 
the virtual shared memory 22. Each page 24a-24d is ac- 
s cessible to each one of the nodes I2a-12c, and alterna- 
tively, each node can store additbnal pages of data with- 
in the virtual shared memory 22. Each newly stored 
page of data can be accessible to each of the other 
nodes 1 2a-12c. Accordingly, the virtual shared memory 
10 22 provides a system for sharing and communicating 
-data between each node 1 2 of the computer network 1 0. 
FIG. 2 illustrates in functional block diagram form a 
computer network 30 that has a distributed shared 
memory. In this embodiment, each node 12a-12c has a 
IS rnemory subsystem 32 that connects between the op- 
eraliig iSystem 16 and the two kxal memory devices, 
the BAM 34 and the disk 36, and that further couples to 
a networic 38 that couples to each of the depteted nodes 
12a, 12b and 12c and to a network memory device 26. 
20 More particularty, FIG. 2 illustrates a distributed 
shared memory network 30 that includes a plurality of 
nodes 12a. 12b and 12c, each including a processing 
unit 14, an operatlr>g system 16, a memory subsystem 
32, a RAM 34, and a disk 36. FIG. 2 further depicts a 
25 computer network system 38 that connects between the 
nodes 12a, 12b and 12c and the network memory de- 
__vk5e 26. The network 38 provkies a network communi- 
cation system across these elements. 

The illustrated memory subsystems 32a-32c that 
30 connect between the operating system 15a-16c, the 
memory elements 34a-34c, 36a-36c, and the network 
38, encapsulate the local memories of each of the nodes 
to provide an abstraction of a shared virtual memory 
system that spans across each of the nodes I2a, I2b 
3S and 12c on the network 3B. The memory subsystems 
32a-32c can be software modules that act as distribu- 
tors to map portions of the addressable memory space 
across the depcted memory devices. The memory sub- 
systems further track the data stored in the tocal mem- 
40 ory of each node 1 2 and further operate network con- 
nections wrth network 38 tor transferring data between 
the nodes 1 2a-1 2c. In this way, the memory subsystems 
32a-32c access and control each memory element oil 
the network 38 to perform memory access operations 
45 that are transparent to the operating system 1 6. Accord- 
ingly, the operating system 16 interfaces with the menv 
ory subsystem 32 as an interface to a global memory 
space thai spans each node 1 2a-12c on the network 38. 
FIG. 2 further depicts that the system 30 provides 
so a distributed shared memory that includes persistent 
storage for portksns of the distributed memory In partic- 
ular, the depteted embodiment includes a nnemory sub- 
system, such as subsystem 32a, that interfaces to a per- 
sistent merrrory devwe, depicted as the disk 36a. The 
56 Bubsystem 32a can operate the persistent memory de- 
vice to provWe persistent storage tor portions of the dis- 
tributed shared memory space. As illustrated, each per- 
sistent memory device 36 depicted in FIG. 2 has a por- 
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tion of the addressable memory space mapped onto it. 
For example, device 36a has the portions of the ad- 
dressable memory ^ace. Co. C^. Cg. mapped onto it, 
and provides persistent storage for data signals stored 
in those ranges of addresses. * 

Accordingly, the subsystem 32a can provide inte- 
grated control o1 persistent storage devices and elec- 
tt^tc memory to allow the distributed shared memory 
space to span across both types of storage devices, and 
to allow portions of the distributed shared menrwry to 
move between persistent and electronk:memofy de- 
pending on predetermined conditions, such as recent 
usage. 

In one optional embodiment, the nodes of the net- 
work are organized into a hierarchy of groups. In this 
embodiment, the memory subsystems 32a-32c can In- 
clude a hierarchy manager that provides hierarchical 
control for the distribution of data. This includes control- 
ling the migration controller, and policy controllen which 
are discussed in detail below, to perform hierarchical da- 
ta migration and load balancing, such that data migrates 
primarily, between computers erf the same group, and 
passes to other groups in hierarchical order. Resource 
distribution Is similarly managed. 

FIG. 3 illustrates in mora detail one shared memory 2B 
subsystem 40 according to the invention. FIQ. 3 depicts 
a shared memory subsystem 40, that includes an inter- 
face 42, aDSM directory manager 44, a rriemory con- 
troller46, a local disk cache controller 48, and local RAN/I 
cache cwitroller 50. FIG. 3 further depicts the networlt 30 
54, an optional consumer of the DSM system, depicted 
as the sen/ice 58, the operating system 16, a disk driver 
60, a disk element 62 and a RAM element 64. 

The shared memory subsystem 40 depicted in FIG. 
3 can encapsulate the memorymanagement operattons 35 
of the network node 1 2 to provide a virtual shared mem- 
ory that can span across each node that connects rito 
the network 54. Accordingly, each local node 12 views 
the network ae a set of nodes that are each connected 
to a large shared computer memory. ^ 

The depicted interlace 42 provdes an entry point 
for the local node to access the shared mennory space- 
of the computer network. The interface 42 can couple 
directly to the operating system 1 6. to a distributed seni- 
les utility such as the depicted DSM file system 58, to a ^ 
distributed user-level service utility, or alternatively to 
any combination thereof. 

The depided interface 42 provides an API that Is a 
memory oriented API. Thus, the illustrated interface 42 
can export a est of interfaces that provide low-level con- ^ 
trol of the distrftsutsd menrtory As illustrated in FIG. 3, 
the interface 42 exports the API to the operating system 
1 6 or to the optional DSM service 58. The operatir)g sys- 
tem 1 6 or the sen/ice employs the interface 42 to request 
standard memory management techniques, such as 55 
rsadlng and writing from portions of the memory space. 
These portksns of the menrxsry space can be the pages 
as described above which can be 4K byte portions of 



the shared memory space, or other units of nrwmory. 
such as objects or segnr^nts. Each page can be kicated 
within the shared menfK>ry space which Is designated by 
a global address signal for that page of menrtory. The 
system can receive address signals from an application 
program or, optionally, can include a global address 
gsnerator that generates the address signals. The ad- 
dress generator can include a spanning module that 
generates address- signals for a memory space that 
spans the storage capacity of the network. 

Accordingly in on© embodiment, the interface 42 

receives requests to manipulate pages of the shared 
memory space. To this end, the intertace 42 can com- 
prise a software module that includes a library of func- 
tions that can be called by senrices, the OS 16, or other 
caller, or device. The function calls provide the OS 16 
with an API of high level memory oriented sen^bes. such 
as read data, write data, and allocate memory. The im- 
plementation of the functions can include a set of calls 
to controls that operate the directory manager 44, and 
the kx^ memory controller 46. Accordingly, the inter- 
face 42 can be a set of hig^ level memory function calls 
to interface to the k>w-level functk>nal elements of 
shared memory subsystem 40. 

FIG. 3 further ctepkrts a DSM directory manager 44 
that couples to the interface 42. The interface 42 passes 
request signals that represent requests to implement 
memory operations such as alk}cating a portion of mem- 
ory, locking a portion of memory, mapping a portion of 
memory, or some other such memory tunctbn. The di- 
rectory manager 44 manages a directory that can irv 
elude nrtappings than can span across each nnemory de- 
vice connected to the network 38 depicted in FIG. 2, In- 
cluding each RAM and disk element accessible by the 
network. The directory manager 44 stores a global di- 
rectory structure that provkies a map of the global ad- 
dress space, in one embodiment as will be explained in 
greater detail hereinafter, the directory manager 44 pro- 
vides a global directory that maps between global ad- 
dress signals and responsible nodes on the network. A 
responsble node stores infomiation regarding the loca- 
tion and attributes of data associated wHh a respective 
global address, and optionally stores a copy of that 
page's data. Consequently, the directory nr\anager 44 
tracks informatton for accessing any address location 
within the virtual address space. 

The control of the distributed shared memory can 
be coordinated by the directory nnanager 44 and the 
memory controller 46. The directory manager 44 main- 
tains a directory structure that can operate on a global 
address received from the intertace 42 and kientify, tor 
that address, anode on the network that is responsible 
for maintaining the page associated with that address 
of the shared memory space. Once the directory man- 
ager 44 identifies which node is responsible for main- 
taining a partteular address, the directory manager 44 
can identity a node that stores information for kx:ating 
a copy of the page, and make the call to the memo^ 
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controller 46 of that node and pass to that node's nnem- 
ory controller the memory request proNfided by the mem- 
ory interface 42. Accordingly, the depicted directory 
manager 44 is responsible for managing a directory 
structure that identrfies for each page of the shared 
memory space a responsible node that tracks the phys- 
ical location of the data stored in the respective page. 
Thus, the directory, rather than directly providing the lo- 
cation of the page, can optionalty identify a responsible 
node, or other device, that trades the location of the 
page. This indirection facilitates mavitenance of the di- 
rectory as pages migrate between nodes. 

The memory controller 46 performs the low level 
memory access functions that physically store data 
within the memory elements connected to the network. 
In the depteted embodiment, the directory manager 44 
of a first node can pass a memory access request 
through the interface 42, to the network module of the 
OS 1 6, and across the network 54 to a second node that 
the directory manager 44 identifies as the responsible 
node for the given address. The directory manager 44 
can then query the responsible node to determine the 
attributes and the current owner nod© of the memory 
page that is associated with the respective global ad- 
dress. The owner of the respecth/e page is the network 
node that has control over the nrwrnory storage element 
-on which.the.data of the associated page is stored. The 
- memory controller 46 of the owner can access, through 
the OS 16 of that node or through any interface, the 
memory of the owner node to access the data of the 
page that is physically stored on that owner node. 

In particular, as depicted in FIG. a the directory 
manager 44 couples to the network module 52 which 
couples to the network 54. The directory manager can 
transmit to the network module 52 a command and as- 
sociated data that directs the network interface 52 to 
pass a data sigr^l to the owner node. The owner node 
receives the memory request across network 54 and 
through network module 52 that passes the memory re- 
quest to the interface 42 of that owner node. The inter- 
face 42 couples to the memory controller 46 and can 
pass the memory request to the local memory controller 
o1 that owner node for operating the local storage ele- 
ments, such as the disk or RAM elements, to perform 
the requested memory operation. 

Once the owner node has performed the requested 
memory operation, such as reading a page of data, the 
menrwry subsystem 40 of the owner node can then 
transfer the page of data, or a copy of the page of data, 
via the network 54 to the node that origkially requested 
access to that portran of the ehared memory. The page 
of data is transferred via the network 54 to the network 
modute 52 of the requesting node and the shared mem- 
ory subsystem 40 operates the memory controller 46 to 
store in the local memory of the requesting ruxie a copy 
of the accessed data. 

Accordingly, In one embodiment of the invention, 
when a first node accesses a page of the shared mem- 



ory space which is not stored locally on that node, the 
directory manager 44 kJentiftes a node that hae a copy 
of the data stored in thai page and moves a copy of that 
data into the local memory of the requesting node. The 

5 iocal memory storage, both volatile and persistent, of 
the requestbig node therefore becomes a cache for pag- 
es that have been requested by that local node. This 
embodiment is depicted FIG. 3 which depicts a memory 
controller that has a bcal disk cache controller 48 and 

10 a local l=wyA cache controller 50. Both of these local 

— c^e controllers can provide to the operating system 
16, or other consumer pages of the shared memory 
space that are cache stored in the local memory of the 
node, including local persistent memory and local vola- 

r* tite memory. 

The shared memory subsystem can Include a co- 
herent repiicatton controller that maintains coherency 
between cached pages by employing a coherence 
through invalidation process, a coherence throu^ mi- 

zo gration process or other coherence process su'rtabte for 
practice with the present invention. The coherent repli- 
cation controller can automatically generate a copy of 
the data stored in each page and can store the copy in 
a memory devtee that la separate from the memory de- 

2S vice of the original copy. This provkdes for fault tolerant 
operation, as the failure of any one memory device will 
not result in the bss of data. The coherent replication 
controller can be a software model that monitors all cop- 
ies of pages kept in volatile memory and made available 

30 tor writing. The controller can employ any of the coher- 
ency techniques named above, and can store tables of 
tocation information that kJentlfles the kxation informa- 
tksn for all generated copies. 

FIG. 4 illustrates In greater detail one embodiment 

3S of a shared memory subsystem according to the inven- 
tion. The shared memory subsystem 70 depicted in FIG. 
4 includes a rentote operatk>ns element 74, a local RAM 
cache 76, a RAM copyset 78, a global RAM directory 
80, a disk copyset 82. a gk^bal disk directory 84, a con- 

40 figuration manager 88, a policy element 90. and a bcal 
disk cache 94. FIG. 4 further depicts a network element 
104, a physical memory 100, shared data element 102, 
a physical file system 98. which is part of the operating 
system 16, a configuration service 108. a diagnostic 

45 sendee 11 0, and a memory access request 1 1 2. The de- 
picted subsystem 70 can be a computer program that 
couples to the physical memory, file system. arnJ net- 
work system of the host node, or can be electrical circuit 
card assemblies that interface to the host node, or can 

so be a combination of programs and circuit card assem- 
blies. 

The flow scheduler 72 depicted in FIG. 4 can or- 
chestrate the controls provided by an API of the subsys- 
tem 70. In one embodiment, the flow scheduler 72 can 
£5 be a state machine that monitors artd respoi\d& to the 
requests 11 2 and remote requests through network 104 
which can be instructions for memory operations and 
which can Include signals representative of the global 
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addresses being operated on. These merrwry operation 
requests 112 can act as op-codes for primitive opera- 
tions on one or more gtobal addresses. They can be 
read and write requests, or other memory operations. 
Attemattvety, the flow scheduler 72 can be a program, 
such as an interpreter, that provides an execution envi- 
ronment and can map these op-codes into control flow 
programs called applets. The applets can be indepsnd- 
-ent executable programs that employ both environment 
services, such as threading, synchronization, and bufler 
management, and the elements depicted in FIG. 4. The 
API is capable ot being called from both external clients, 
like a distributed shared memory file system, as weli as 
recursively by the applets and the other elements 74-94 
of the subsystem 70. Each element can provide a level 
of encapsulation to the management of a particular re- 
source or aspect of the system. To this end, each ele- 
ment can export an API consisting of functions to be em- 
ployed by the applets. This stnjcture is illustrated in FIQ. 
4. Accordingly, the flow scheduler 72 can provide an en- 
vironment to load and execute applets. The applets are 
dispatched by the flow scheduler 72 on a per op-code 
basis and can perform the control flow for sequential or 
parallel execution of an element to implement the op- 
code on the specified global address, such as a read or 
write operation. Optionally, the flow scheduler 72 can 
include an element to change dynamically the applet at 
run time as well as execute applets in parallel and in 
interpreted mode. 

The depicted shared memory subsystem 70 in- 
cludes a bifurcated directory manager that includes the 
global RAM directory 80 and the global disk directory 
84, The global RAM directory BO is a directory manager 
that tracks infomnatton that can provide the location ot 
-pages that are-stored in the volatlle~menrv>ry, typically 
RAM, of the network nodes. The gbbal disk directory 
84 is a global disk directory manager that manages a 
directory structure that tracks information that can pro- 
vide the location ot pages that are stored on persistent 
memory devices. Together, the gtobal RAM directory 80 
and the global disk directory 84 provide the shared 
memory subsystem 70 with integrated directory man- 
agement for pages that are stored in persistent storage 
and volatile merwry. 

In one embodiment a paging element can operate 
the RAM and disk directory managers to remap portions 
ot the addressable memory space between one of the 
volatile memories and one ot the persistent memories. 
In the shared memory system, this allows the paging 
element to remap pages from the volatile memory of one 
node to a disk memory of another node. Accordingly, 
the RAM directory manager passes control of that page 
to the disk directory manager which can then treat the 
page as any other page of data. This albws for improved 
ksad balancing, by removing data from RAM memory, 
and storing It In the disk devtees, under the control of 
the disk directory manager. 

The local menrtory controller of the eubsystem 70 is 



provided by the local RAM cache 76 and the local disk 
cache 94. The local RAM cache 76 whk^ couples to the 
physical memory 100 of the local node can access, as 
described above, the virtual memory space of the local 
5 node to access data that is physically stored w^hin the 
RAM menDory 100. Similarty, the local disk cache 94 
couples to the persistent storage device 98 and can ac- 
cess a physical k>cation that maintains in the local per- 
sistent storage cteta of the distributed shared nr^mory. 
10 FIG, 4 also depicts a remote operations element 74 
that couples between the network 104 and the fknv 
scheduler 72. The remote operations element 74 nego- 
tiates the transfer of data across the network 104 for 
moving portions of the data stored in the shared memory 
16 epace between the nodes of the network. The remote 
operations element 74 can also request services from 
renrtote peers, i.e. invalidate to help maintain coherency 
or for other reasons. 

FIG. 4 also depkns a policy element 90 that can be 
20 a software module that acts as a controller to detemrune 
the availability of resources, such as printer capabilities, 
hard-disk space, available RAM and ether such resourc- 
es. The policy controller can employ any of the suitable 
heurlstk:s to direct the elements, such as the paging 
BS controller, disk directory manager and other elements 
to dynamically distribute the available resources. 

FIG. 4 further depicts a memory subsystem 70 that 
includes a RAM copyset 78 and a disk copyset S2. 
These copysets can manage copies ol pages that are 
30 cached at a single node. The disk copyset 82 can main- 
tain information on copies of pages that are stored in the 
local disk cache, whk:h can be the local persistent mem- 
ory. Similarly the RAM copyset 78 can nnaintain infor- 
nrmtion on copies of pages that are stored in the kx:al 
36 RAM cache which can be the local RAM, These cop- 
ysets encapsulate indexing and storage of copyset data 
that can be empbyed by applets or other executing code 
for purposes of maintaining the coherency of data stored 
in the shared memory space. The copyset elements can 
40 maintain copyset data that kdentifles the pages cached 
by the host node. Further, the copyset can identify the 
other nodes on the network that maintain a copy of that 
page, and can further identify for each page which of 
these nodes ts the owner node, wherein the owner node 
45 can be a node whbh has write privileges to the page 
being accessed. The copysets themselves can be 
stored in pages of the distributed shared memory space. 

The local HAM cache 76 provides storage for mem- 
ory pages and their attributes. In one embodiment, the 
so local RAM cache 78 provides a global address ffidex for 
accessing the cached pages of the distributed memory 
and the attributes based on that page. In this embodi- 
ment, the local ram cache 76 provkJes the index by stor- 
ing in nrranriory a list of each global address cached In 
the k>cal RAM. With each listed gtobal address, the in- 
dex provides a pointer into a buffer memory and to the 
location of the page c^ta. Optionally, with each listed 
global address, the index can further provide attribute 
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information including a version tag representative ot the 
version of the data, a dirty bit representative of whether 
the RAM cached data is a copy of the data held on disk, 
or whether the RAM cached data has been modified but 
not yet flushed to disk, a volatile bit to indicate if the page 
is backed by backing store in persistent memory, and 
other such attribute information ueeful for managing the 
coherency of the stored data. 

in the embodiment depicted in FIG. 4, the memory 

subsystem 70 provides the node access to the distrO)- 
uted menrKwy space by the coordinated operation of the 
directory nnanager that includes the gtobal RAM direc- 
tory 80 and the global disk directory 64, the cache con- 
troller that includes the kx:al RAM cache and the local 
disk cache elements 76 and 94, and the copyset ele- 
ments which include the RAM copyset 78 and the disk 
copyset 62. 

The directory manager provides a directory struc- 
ture that indexes the shared address space. Continuing 
with the example of a paged shared address space, the 
directory manager of the subsystem 70 allows the host 
node to access, by global addressee, pages of the 
shared rrtemory space. 

FiGs. 5 and 6 Illustrate one example of a directory 
structure that provkJes access to the shared memory 
space. FIG. 5 depk^ a directory page 120 that includes 
a.page header J 22, ..directory entries 124 and 126, 
wherein each directory. entry includes a range fiekf 1 30, 
a responsible node field 132. and an address fieki 134. 
The directory pages can be generated by a directory 
page generator that can be a software module control- 
ted by the directory manager, it will be understood that 
the directory nrenager can generate multiple directories, 
including one for the Global disk and one for the Global 
RAM directoriesr The depicted directory-page 1 20 can 
be a page of the gbbat address space, such as a 4K 
byte portion of the shared address space. TTierefore, the 
directory page can be stored in the distributed shared 
memory epace juet as the other pages to which the di- 
rectory pages provide access. 

As further depk;ted in FIG. 5. each directory page 
120 includee a page header 122 that irKludes attribute 
inf ormatksn for that page header, whrch is typically meta- 
data for the directory page, and further includes direc- 
tory entries such as the depicted directory entries, 124 
and 126. which provkie an index into a portk^n of the 
shared address space wherein that portion can be one 
or more pages, including all the pages of the distributed 
shared mennory space. The depicted directory page 1 20 
includes directory entries that index a selected range of 
gtobal addresses ot the shared memory space. To this 
end, the directory generator can include a range gener- 
ator so that each directory entry can include a range field 
1 30 that describes the start oit a range of addresses that 
that entry locates. 

Accordingly, each directory page 120 can include a 
plurality of directory entries, such as entries 124 and 
1 26, that can subdivide the address apace into a subset 



of address ranges. For example, the depicted directory 
page 120 includes two directory entries 124 and 126. 
The directory entries 124 and 126 can, for example, sub- 
divide the address space into two sub-portions. In this 

5 example, the start address range of the directory entry 
124 could be the base address of the address space, 
and the start address range of the directory entry 1 26 
could be the address tor the upper half of the memory 
space. Accordingly, the directory entry 1 24 provides an 

fo index for pages stored in the address space between 
the base address and up to the mid-point of the memory 
space and, in complement thereto, the directory entry 
126 provkles an Index to pages stored in the address 
space that ranges from the mid-point of the address 

IS space to the highest address. 

FIG. 5 further deptets a directory page 120 that in- 
cludes, in each directory entry» a responsible node field 
1 32 and the chikl page global address field 1 34. These 
fiekls 132, 134 provide further k)catk^ infornnatk>n for 

20 the data stored in pages within the address range iden- 
tified in fiekj 130. 

FIG. 6 depicts a directory 140 formed from directory 
pages eimilar to those depicted in FIG. 5. FIG. 6 depicts 
that the directory 140 includes directory pages 142, 

26 150-154, and 160-166. FIG. 6 further depicts that the 
directory 1 40 provides location infomnalion to the pages 
of the distributed shared nr>emory space depicted tn Fl G . 
6 as pages 170-184. 

The directory page 142 depicted in RG. 6 acts like 

30 a root directory page and can be k^cated at a static ad- 
dress that is known to each node coupled to the distrib- 
uted address space. The root directory page 142 in- 
cludes three directory entries 144, 146, and 148. Each 
directory entry deputed In FIG. 6 has directory entries 

3S Similar to those depicted-in FIG. 5. For example, direc- 
tory entry 144 includes a variable Co which represents 
the address range field 130, a variable Nj representative 
ot the field 1 32, and a variable Cs representative of the 
field 134. The depicted root directory page 142 eubdi- 

40 vides the address space into three ranges illustrated as 
an address range that extends between the address Co 
and Cd, a second address range that extends between 
the address Cd and Cg, and a third address range that 
extends between Cg and the highest menrtory bcatbn 

4S of the address space. 

As further depicted in FIG. 6. each directory entry 
144, 146, and 148 points to a subordinate directory 
page, depicted as directory pages ISO, 152. and 154. 
each of which further subdrvkles the address range in- 

50 dex by the associated directory entry of the root direc- 
tory 142. In FIG. 6. this subdivision process continues 
as each of the directory pages 150. 152, and 154 each 
again have directory entries that locate eubordriate di- 
rectory pages including the depicted examples of direc- 

^ tory pages 160, 162, 164, and 166. 

The depicted example of directory pages 160, 162, 
1 64, and 1 66 are each leaf entries. The leaf entries cork- 
tain directory entries such as the directory entries 1 56 
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and 158 of the leaf entry 160. that store a range field 
1 30 and the responsible node field 1 32. These leaf en- 
tries identify an address and a responsible node for the 
page in the distributed .memory space that Is being ac- 
cessed, such as the depicted pages 170-184. For ex- 
ample, as depicted in FIG. 6. the leaf entry 155 points 
to the page 170 that corresponds to the range field 130 
ot the leaf entry 156, which for a leaf entry Is the page 
being accessed. In this way, the directory structure 140 
provides location information for pages stored in the dis- 
tributedaddress'space. 

In the depicted embodiment of FIG. 6, a node se- 
lector can select a responsible nods for each page, as 
described above, so that the leaf entry 1 56 provides in- 
formation of the address and responsible node of the 
page being located. Accordingly, this directory tractts 
ownership and responsibility for data, to provide a level 
of indirection between the directory and the physical b- 
"cation of the data. During a memory access operation, 
the memory subsystem 70 passes to the responsible 
node indicated in the leaf entry 156 the address of the 
page being accessed. The shared menrK)ry subsystem 
of that node can identify a node that stores a copy of the 
page being accessed, including the owner node. This 
identification of a nods havhg a copy can be perfooned 
by the RAM copyset or disk copyset of the responsible 
node. The node having a copy stored In its local j)hy6icai 
memory, such as the owner node, can employ its local 
cache elements, including the local BAhA cache and lo- 
cal disk cache to the identify from the global address 
signal a physical location of the data stored in the page 
being accessed. The cache element can emptoy the op- 
erating system of the owner node to access the memory 
device that nnaintains that physical location in order that 
-the data stored in the page can be accessed. For a read- 
memory operation, or for other similar operations, the 
data read from the physical memory of the owner node 
can be passed via the network to the memory subsys- 
tem of the node requesting the read ^d subsequently 
stored into the virtual memory space of the requesting 
node for use by that node. 

With reference again to FIG. 6, it can be seen that 
the depicted directory structure 1 40 comprises a hierar- 
chical structure. To this end, the directory structure 140 
provides a structure that continually subdivides the 
memory space into smaller and snrtalter sections. Fur- 
ther, each section Is represented by directory pages of 
the same structure, but indexes address spaces of dif- 
ferent sizes. As pages are created or deleted, a linker 
inserts or deletes the pages from the directory. In one 
embodiment, the linker is a software module for linking 
data structures. The linker can operate responsive to 
the-address ranges to provide the depicted hierarchical 
structure. Accordingly, the depicted directory 140 pro- 
vides a ecaleable directory for the shared address 
space. Moreover, the directory pages are stored In the 
distributed address space and maintained by the distrib- 
uted shared memory system. A root for the directory can 



be stored in known locations to allow for boc^strap of the 
system. Consequently, convnonly used pages are cop- 
ied and distributed, and rarely used pages are shuffled 
off to disk. Similarty, directory pages will migrate to those 

s nodes that access them most, providing a degree of self- 
organizatk)n that reduces network traffic. 

FIG. 7 depicts the directory of FIG. 6 being em- 
ployed by a system according to the invention. In par- 
-ticular FIG 7 depicts a system 200 that includes two 

10 nodes, 206a and 206b, a directory structure 140, and a 
-pair of local-memories having volatile memory devices 
64a and 64b, and persistent memory devices 62a and 
62b. Depicted node 206a includes an address consum- 
er 208a, a global address 210a, and interface 42a, a 

IS directory manager 44a and a memory controller 46a. 
Node 206b has corresponding elements. The nodes are 
connected by the network 54. A directory 140 Slaving a 
root page, directory pages A-F and pages 1-5 is further 
depicted. 

20 Each node 206a and 206b operates as discussed 
above. The depicted address consumers 208a and 
208b can be an application program, file system, hard- 
ware devtee or any other such element that requests ac- 
cess to the virtual memory. In operatk>n, the address 

25 consumers 208a and 208b request an address, or range 
of addresses, and the directory manager can include a 
global address generator that provides the consumer 
with the requested address, or a pointerto tlie requested 
address. As addresses get generated, the respective di- 

30 rectory managers 44a and 44b generate directory pag- 
es and store the pages in the directory structure 1 40. As 
depicted, the directory structure 140 tracks the porttons 
of the address space being employed by the system 
200, and phystcal storage for each page Is provided 

3S within-the local nDemories. 

As shown in FIG. 7, the data associated with the 
directory pages are distributively stored across the two 
local memories and duplicate copies can exist. As de- 
scribed above and now illustrated In FIG. 7, the data can 

40 move between different local memories and also nx}ve. 
or page, between volatile and persistent storage. The 
data movennent can be resportslve to data requests 
made by memory users like application programs, or by 
operation of the migratbn controller described above. 

4S As also described above, the movement of data be- 
tween different memory locations can occur without re- 
quiring changes to the directory 140. This is achieved 
by providing a directory 140 that is decoupled from the 
physical location of the data by empbying a pointer to 

^ a responsible nods that tracks the data storage location. 
Accordingly, although the data storage location can 
change, the responsible node can remain constant, 
thereby avoiding any need to change the directory 1 40. 
It will be urderstood to tliose of ordinary skill in the 

^ art that certain modification, additions, and subtractions 
can be made to the embodiments described above with- 
out departing from the spirit and scope of the invention. 
Accordingly, the invention described above ia not to be 
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limited to the itiustrated embodiments and is to be un- 
derstood by the claims set lorth below. 



Claims 

1 A computer system having a shared addressable 
memory space, comprising 
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a data network for carrying data signals repre- lo 
sentatlve of computer readable intomiation, 
a persistent memory device, coupled to said 
data network and having persistent storage for 
data signals; 

plural computers, each having 
an Interface, coupled to the data network, for 
accessing said data network to exchange data 
signals therewith, and 

a shared memory subsystem for mapping a 
portion of said addressable menwry space to a 
portion of said persistent storage to provkie 
thereby addressable persistent storage for data 
eignals. 

A computer system as claimed in claim I wherein 
said persistent memory device comprises a plurality 
, of local persistent mennory devices each coupled to 
a respective one of said plural computers. 

A computer system as dairrtad in claim 2 further 
comprising 

a distributor for nrepping portions of said ad- 
dressable memory space across said plurality of lo- 
cal persistent memory devk^ee. to provkJe an ad- 
dressable memory space distributed across said lo- 
cal persistent storage of sakj computers 

A computer system as claimed in claim 3 further 
comprising 

a disk directory manager for tracking said 
mapped portions of said addressable menrK>ry 
space to provide information representative of said 
local persistent mennory device having said portion 
o1 said addressable mennory space mapped there- 
on. 

A computer system as claimed in claim 2 further 
comprteing 

a cache system for operating one of said local 
persistent memory devices as a cache memory tor 
cache storing data signals associated with recently 
accessed portions of savi addressable menrwry 
space. 

A computer system as claimed in cteim 2 further 
oomprlsing 

a mtgratkxi controller for selectively moving 
portions of said addressable memory space be- 



tween said local persistent memory devices of said 
plural computers. 

A computer system as claimed in claim 2 further 
comprising 

a replication controller for generating a copy 
of a portion of said addressable memory space 
maintained in said local persistent memory device 

- of a first computer and for storing said copy in said 
\oca\ persistent mennory device of a second compu- 

~ter. 



8. A computer system as claimed in claim 1 further 
comprising 

a volatile memory device having volatile stor- 
age for data signals, and 
wherein said shared memory subsystem in- 
cludes means for mapping a porton of said ad- 
dressable memory space to a portion ot said 
volatile storage. 

9. A computer system as claimed in claim B wherein 

saM volatile memory devkse comprises a plu- 
rality of local volatile memory devices each cou- 
- — - pled to a respective one of saW plural comput- 
ers, and 

sakj persistent memory device comprises a 
plurality of local persistent memory devices 
each coupled to a respective one of said plural 
computers. 



10. A computer system as claimed in claim 9 further 
■35 comprising a-distributor, coupled to said data net- 
work, for mapping portions of said addressable 
memory space across said storage provided by sakJ 
local persistent and votetile memory devices. 

40 11. A computer system as claimed in claim 10 further 
compnsing 

a directory nr>anager for tracking said mapped 
portiais ot said addressable memory space. 

45 12. A computer system as claimed in claim 10 further 
comprising 

a disk directory manager for tracking portions 
ot said addressable memory space mapped to said 
kxal persistent memory devices. 

60 

13. A computer system as claimed in claim 11 wherein 
said directory manager includes 

a RAM directory manager for tracking por- 
tions ot said addressable memory space mapped 
s£ to said local volatile memory devices. 

14. A computer system as claimed in daim 9 further 
comprising 
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a RAM cache system for operating one of said 
local volatile memory devicoe as a cache memory 
for cache storing data signals associated with re- 
cently accessed portions of said addressable mem- 
ory space. 

15. A computer system as claimed in claim 9 further 
comprising 

a paging element for remapping a portion of 

said addressable memory space between one of 

— said local volatile memory devices and one of said 
local persistent menrnry devices. 

16. A computer system as claimed in claim 15 further 
comprising 

a policy controller for dctormlning a resource 
available signal representative of storage available 
on each o1 said plural computers and, and wherein 
' said paging element remaps said portion of ad- 
dressable memory space from a memory device of 
a first computer to a memory device of a second 
computer, responsive to said resource available 
signal. 



a storage capacity associated with said persistent 
memory devices, to provide global address signals 
capable of logically addressing said storage capac- 
ity of said persistent memory devices. 

6 

23. A computer system as claimed in claim 3 further 
corrprising 

a distribuied directory manager for storing 
— within said distrtouted memory space, a directory 
10 signal representative of a storage location of a por- 
tion of said addressable memory space. 

24. A connputer system as claimed in claim 23 wherein 
said distributed 

IS 

directory manager includes 
a directory page generator for allocating a por- 
tion of said addressable memory space and for 
storing therein an entry signal representative of 
20 a portion of said directory signal. 

25. A computer system as claimed in claim 24 wherein 
said directory page generator includes 



17. A computer system as claimed in claim 9 further 
comprising 

~ a migration controller for moving portions of 
addressable memory space between said local vol- 
atile memory devices of said plural computers. 

18. A computer system as claimed in claim 9 further 
comprising 

a hierarchy manager for organizing said plural 
computers into a set of hierarchical groups wherein 
—-each group includes at least one of-said plural com-- 
puters. 

19. A computer system as claimed in claim 18 wherein 
each said group Includes 

a group memory manager tor migrating por- 
tions of addressable menrKry space as a function 
of said hierarchical groups. 

20. A computer system as claimed in claim 9 further 
comprising 

a coherent replication controller for generat- 
ing a coherent copy of a portion of addressable 
memory space. 

21. A computer system as daoned in claim 1 further 
comprising 

an address generator tor generating a global 
address signal representative of a portion of ad- 
dressable memory space. 

22. A computer system as claimed In claim 21 , wherein 
said address generator includes a spanning unit for 

generating global address signals as a function of 



zs a range generator for generating a range signal 

representative of a portion 
of said-addressable memory space, and for 
generating said entry signal responsive to said 
range signal, to provide an entry signal repre- 

30 sentative of a portion said directory signal 

that corresponds to said portion of said ad- 
dressable memory space. 

26. A computer system as claimed in claim 25 wherein 
^ss ^- said distributed directory manager Includes a link- 
ing system for linking said directory pages to form 
a hierarchicat data structure of said linked directory 
pages, 

40 27. A computer system as claimed in claim 25 wherein 
said distributed directory manager includes a range 
linking system for linking saki directory pages, as a 
function of said range signal, to form a hierarchical 
data structure of linked directory pages. 

4S 

28. A computer system as claimed in claim 24 wherein 
said directory page generator Includee a node se- 
lector lor generating a responsible node signal rep- 
resentativB of a select one of eakj plural computers 

BO having locatk>n information for a portion of said 
shared address space. 

29. A computer system having a shared addressable 
memory space, comprising 

55 

a data network for carrying data signals repre- 
sentative of computer readable information, 
a hard-disK, coupled to said data network, and 



12 



23 



EP0 844 559 A2 



having persistent storage tor data signals, 
plural computers, each having an interface, 
coupled to said data networK for exchanging 
data signals between said plural computers, 
and * 
a shared menrK>ry subsystem, coupled to said 
data network, for assigning a portion of said ad- 
dressable memory space to a portion ot said 
persistent storage of said hard disk to provide 
thereby add ressable persistent storage for data t o 
signals: 

30. A computer system as claimed in c^im 29 further 
comprising 

a volatile memory device providing volatile ^5 
storage for storing data signals, and wherein said 
shared memory subsyslem includes means for 
mapping a portion of said addressable memory 
^space to a portion of said volatile storage. 

20 

31. A computer system as claimed in claim 29 further 
comprising, 

a page generator for generating a directory page 
that carries information representative of a location 
monitor that tracks a data storage location, to pro- 2S 
vide a directory structure for tracking homeless da- 
ta. - ~ ~ 

32. A computer system as claimed in claim 31 wherein 
sakJ data storage location stores Information repre- 30 
sentattve of a directory page, to store said directory 
structure as pages of homeless data. 



33. A method for providing a computer system having 
— B'Shared addressable-memory space, comprising— 55 - 
the steps of 

provkiing a network for carrying data signals 
representative of computer readable informa- 
tion, 

providing a hard-disk, coupled to said network, 
and having persistent storage for data signals, 
providing plural computers, each having an in- 
terface, coupled to said data network, for ex- 
changing data signals between sakipluml com- 45 
puters, and 

assigning a portion of said addressable mem- 
ory space to a portktn of said persistent storage 
of said hard disk to provide addressable per- 
sistent storage for data signals. 
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